Noise-matched training of CRF based sentence end detection models

نویسندگان

  • Madina Hasan
  • Rama Doddipatla
  • Thomas Hain
چکیده

Sentence end detection (SED) is an important task for many applications and has been studied on written text and automatic speech recognition (ASR) transcripts. In previous work it was shown that conditional random fields models gave best SED performance on a range of tasks, with and without the inclusion of prosodic features. So far, true transcripts were used for both training and evaluation of SED models. However, in the context of noisy ASR transcripts the performance degrades significantly, especially at medium to high ASR error rates. In this work we demonstrate the correlation of SED performance with word error rate (WER), at different ASR system performance levels. A new method is introduced for transferring SED labels onto noisy ASR transcripts for model training of noisematched SED models. The proposed method significantly improves the performance of SED models, and provides 11% relative gain in slot error rate when compared with models trained on true transcripts. This paper further investigates the effect of noise-matched trained SED with different features. It is observed that the impact of textual features reduces significantly with low ASR performance. However, prosodic features still have noticeable impact.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-pass sentence-end detection of lecture speech

Making speech recognition output readable is an important task. The first step here is automatic sentence end detection (SED). We introduce novel F0 derivative-based features and sentence end distance features for SED that yield significant improvements in slot error rate (SER) in a multi-pass framework. Three different SED approaches are compared on a spoken lecture task: hidden event language...

متن کامل

Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners

Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries in Chinese sentences. That makes WOEs detection a...

متن کامل

Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario

Modifying the articulatory parameters to raise the prominence of a segment of an utterance (hyperarticulating) is usually accompanied by a reduction of these parameters (hypoarticulation) for the neighboring segments. In this paper we investigate different approaches for the automatic labeling of the prominence of words. In particular, we investigate how the information in the sequence can be u...

متن کامل

Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking

In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) ...

متن کامل

A HMM POS Tagger for Micro-blogging Type Texts

The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015